19 research outputs found

    Towards a discipline of performance engineering : lessons learned from stencil kernel benchmarks

    Get PDF
    High performance computing systems are characterized by a high level of complexity both on their hardware and software side. The hardware has evolved offering a lot of compute power, at the cost of an increasing effort needed to program the systems, whose software stack can be correctly managed only by means of ad-hoc tools. Reproducibility has always been one of the cornerstones of science, but it is highly challenged by the complex ecosystem of software packages that run on HPC platforms, and also by some malpractices in the description of the configurations adopted in the experiments. In this work, we first characterize the factor that affects the reproducibility of experiments in the field of high performance computing and then we define a taxonomy of the experiments and levels of reproducibility that can be achieved, following the guidelines of a framework that is presented. A tool that implements said framework is described and used to conduct Performance Engineering experiments on kernels containing the stencil (structured grids) computational pattern. Due to the trends in architectural complexity of the new compute systems and the complexity of the software that runs on them, the gap between expected and achieved performance is widening. Performance engineering is critical to address such a gap, with its cycle of prediction, reproducible measurement, and optimization. A selection of stencil kernels is first modeled and their performance predicted through a grey box analysis and then compared against the reproducible measurements. The prediction is then used to validate the measured performance and vice-versa, resulting in a "Gold Standard" that draws a path towards a discipline of performance engineering

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

    SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App

    Full text link
    Numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally-demanding calculations, in terms of sustained floating-point operations per second, or FLOP/s. It is expected that these numerical simulations will significantly benefit from the future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The performance of the SPH codes is, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. In this work an extensive study of three SPH implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to expose any limitations and characteristics of the codes. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. We implemented a rotating square patch as a joint test simulation for the three SPH codes and analyzed their performance on a modern HPC system, Piz Daint. The performance profiling and scalability analysis conducted on the three parent codes allowed to expose their performance issues, such as load imbalance, both in MPI and OpenMP. Two-level load balancing has been successfully applied to SPHYNX to overcome its load imbalance. The performance analysis shapes and drives the design of the SPH-EXA mini-app towards the use of efficient parallelization methods, fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    Trusted high-performance computing in the classroom

    No full text
    A well-designed high-performance computing (HPC) course not only presents theoretical parallelism concepts but also includes practical work on parallel systems. Today's machine models are diverse and as a consequence multiple programming models exist. The challenge for HPC course lecturers is to decide what to include and what to exclude, respectively. We have experience in teaching HPC in a multi-paradigm style. The practical course parts include message-passing programming using MPI, directive-based shared memory programming using OpenMP, partitioned global address space based programming using Chapel, and domain-specific programming using a high-level framework. If these models are taught in an isolated mode, students would have problems in assessing the strengths and weaknesses of the approaches presented. We propose a project-based approach which introduces a specific problem to be solved (in our case a stencil computation) and asks for solutions using the programming approaches introduced. Our course has been successfully taught several times but a major problem has always been checking the individual student solutions, especially to decide which performance results reported one can trust. In order to overcome these deficiencies, we have built a pedagogical tool which enhances the trust in students' work. In the paper we present the infrastructure and tools that make student experiments easily reproducible by lecturers. We introduce a taxonomy for general benchmark experiments, describe the distributed architecture of our development and analysis environment, and, as a case study, discuss performance experiments when solving a stencil problem in multiple programming models

    Reproducible Stencil Compiler Benchmarks Using PROVA!

    No full text

    Reproducible experiments in parallel computing: concepts and stencil compiler benchmark study

    No full text
    For decades, the majority of the experiments on parallel computers have been reported at conferences and in journals usually without the possibility to verify the results presented. Thus, one of the major principles of science, reproducible results as a kind of correctness proof, has been neglected in the field of experimental high-performance computing. While this is still the state-of-the-art, current research targets for solutions to this problem. We discuss early results regarding reproducibility from a benchmark case study we did. In our experiments we explore the class of stencil calculations that are part of many scientific kernels and compare the performance results of four stencil compilers. In order to make these experiments reproducible from remote, a first prototype of an replication engine has been developed that can be accessed via the internet

    Reproducibility in Practice: Lessons Learned from Research and Teaching Experiments

    No full text

    Performance output data and configurations of stencil compilers experiments run through PROVA!

    No full text
    The data in this article are related to the research article titled “Reproducible Stencil Compiler Benchmarks Using PROVA!”. Stencil kernels have been implemented using a naïve OpenMP (OpenMP Architecture Review Board, 2016) [1] parallelization and then using the stencil compilers PATUS (Christen et al., 2011) [2] and (Bondhugula et al., 2008) PLUTO [3]. Performance experiments have been run on different architectures, by using PROVA! (Guerrera et al., 2017) [4], a distributed workflow and system management tool to conduct reproducible research in computational sciences. Information like version of the compiler, compilation flags, configurations, experiment parameters and raw results are fundamental contextual information for the reproducibility of an experiment. All this information is automatically stored by PROVA! and, for the experiments presented in this paper, are available at https://github.com/sguera/FGCS17

    Reproducible Benchmarking of Parallel Stencil Codes

    No full text
    State of the art in performance reporting in the High Performance Computing field is omitting details that are important in order to be able to test and reuse them, affecting what is considered to be a pillar of science since 17 th century: the scientific method. Every scientist must be able to understand and extend the work of another. Modern architectures are extremely complex and scientists often focus their efforts on what they are pursuing, ignoring the importance of making their science reproducible. We acknowledge the lack of time and the effort necessary to follow good practices in conducting research in computational science, yet consider reproducibility to be extremely important. For this reason, we first designed and implemented a software framework, prova!, to help scientists in their research allowing them to focus on their core business and taking care of its reproducibility, and then used it in our own research on benchmarking of parallel stencil codes. Stencil codes are an important and widely used pattern in computational science, but, due to their low arithmetic intensity, it is tough to achieve a good performance. For this reason several approaches to stencil computation have been proposed and several stencil compilers implemented. Starting from the problem of stencil computation we dive into stencil compilers. How to compare them? Many factors can affect the outcome of a comparison, such as the stencil itself (what if we change the stencil computation?), the architecture (does compiler A always outperform compiler B, even on a different architecture?). How to evaluate them while changing both the stencil to compute and the architecture? Using prova! we can provide a comparison of the same stencil on different architectures and different stencils on both the same or different machines. Benchmarking performance models are needed to individuate the relevant parameters. We plan to interpret the performance data by means of models such as the Execution-Cache-Memory Performance Model and evaluate how performance is affected by explicitly pinning the threads, following different pinning strategies. The goal is to understand how compilers behave in relation to the architecture chosen and to state a subset of parameters to take into account, in order to predict the final performance. Furthermore, we propose a standardized way of describing an experiment, extrapolating a minimum amount of information needed for reproducibility purposes. We try to address the questions: “Are the results of our research reproducible? Can other scientists trust our conclusions?”. For this reason we conduct our research using prova!, a framework which will strengthen and give credibility to our results: through it we make available source code, dependencies, environment, build process running and post-processing scripts, and the raw data used for generating the graphs. Conducting research this way has several positive effects: you remain with a complete documentation, it helps the follow-up of studies allowing to build upon existing work and, in case of discrepancies during a replication, it helps in identifying and addressing the root of the problem
    corecore